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The variation distance closure of an exponential family with a 
convex set of canonical parameters is described, assuming no regu¬ 
larity conditions. The tools are the concepts of convex core of a mea¬ 
sure and extension of an exponential family, introduced previously 
by the authors, and a new concept of accessible faces of a convex 
set. Two other closures related to the information divergence are also 
characterized. 

1. Introduction. Exponential families of probability measures (p.m.’s) 
include many of the parametric families frequently used in statistics, proba¬ 
bility and information theory. Their mathematical theory has been worked 
out to a considerable extent [1, 2, 3, 11]. Although limiting considerations 
are important and do appear in the literature, less attention has been paid 
to determining closures of exponential families. 

For families supported by a finite or countable set, closures were consid¬ 
ered in [1], pages 154-156, and [2], pages 191-201, respectively, the latter 
with regularity conditions. In the general case, different closure concepts 
come into account. Our main result. Theorem 2 in Section 3, determines the 
closure in variation distance (variation closure) of a full exponential fam¬ 
ily and, more generally, of any subfamily with a convex set of canonical 
parameters. Weak closures appear much harder to describe in general, but 
Theorem 1 in Section 3 is a step in that direction. 
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Other closure concepts are based on Kullback-Leibler /-divergence (in¬ 
formation divergence or relative entropy) 

[ -|-oo, otherwise. 

With the terminology of [7], these are the /-closure and reverse /-closure 
(r/-closure); early work related to the latter appeared in [3]. The /-closure 
of a convex set S of p.m.’s is relevant, for example, in large deviations 
theory, where the conditional limit theorem for i.i.d. sequences on the con¬ 
dition that the empirical distribution belongs to S involves the “generalized 
/-projection” to S which is in the /-closure of S; see [4], In the context of ex¬ 
ponential families, the r/-closure is of major statistical interest; for example, 
when the likelihood function is bounded but its maximum is not attained, 
a “generalized maximum likelihood estimate” can be introduced as a p.m. 
that belongs to the r/-closure; see [7]. 

Formally, the variation closure clt,(5), respectively, the /-closure cl/(5) 
and the r/-closure clr/(5) of a set S of p.m.’s on a given measurable space, 
consists of all p.m.’s P to which there exists a sequence Qn in S such 
that the total variation \P — Qn\-, respectively, the /-divergence D{Qn\\P) 
or D{P\\Qn), goes to zero as n ^ oo. The Pinsker inequality \P — Q\‘^ < 
2D{P\\Q) implies that both cl/(5) and clr./(5) are contained in clt,(5). For 
exponential families, the last inclusion gives a good approximation to clr/(iS), 
for example, all p.m.’s in cl„(5) with mean belong to clr/(iS). This is one 
motivation for our study of variation closures, in addition to intrinsic math¬ 
ematical interest. Theorem 3 in Section 3 characterizes those p.m.’s in the 
variation closure that belong also to the r/-closure. The /-closure is much 
easier to describe than the other closures (see Corollary 2), in particular, 
full exponential families are /-closed. It should be mentioned that the I- 
and r/-closures are not topological closure operations because they are not 
idempotent. An example of an exponential family £ with clr/(clr/(/’)) strictly 
larger than c\ri{£) is given in [8]. On the other hand, the I- and r/-closures 
are sequential closures in suitable topologies; see [9]. 

Our attention is focused on exponential families that consist of p.m.’s on 
and have a canonical statistic equal to the identity mapping. Clearly, 
determining their variation, I- or r/-closures, the same problems are solved 
for general exponential families of p.m.’s on any measurable space, with d- 
dimensional canonical statistic, by mapping the family to one on via the 
canonical statistic. 

A crucial construction is that of the extension ext(£’) of a full exponential 
family £, introduced by the authors [5, 7] based on their concept of the 
convex core of a measure on M'’* [6]; see the definitions in Section 2. The 
inclusion £ C ext(f’) is strict unless no nontrivial supporting hyperplane of 


CLOSURES OF EXPONENTIAL FAMILIES 


3 


the (common) convex support of the p.m.’s in E has positive probability 
under these p.m.’s, by [7], Remark 3. By Lemma 6 below, the variation 
closure of £ is contained in ext(f’). A stronger result announced in [5], the 
variation closedness of ext (<5), follows as Corollary 3 from Theorem 4 that 
deals with variation convergent sequences in ext(T). 

The inclusion cl^(£’) C ext(£’) implies that if the subset clri{£) of cl„(£’) 
is equal to ext(£’) (e.g., if the domain of canonical parameters is the whole 
M'’*; see [7], Lemma 6(ii)), then also cl^(£’) =ext(T). Moreover, since £ is 
r/-closed if and only if £ = ext (if) ([7], Corollary 2), the last condition is 
necessary and sufficient also for the variation-closedness of £. 

The cases just mentioned, although frequent in practice, are of secondary 
interest for our purposes. This paper is primarily devoted to the general case 
when all common regularity conditions are absent, although the assumption 
of steepness or even regularity (see [1], pages 116 and 117) would not lead 
to significant simplifications. The typical situation we have in mind is when 
the p.m.’s in £ have both discrete and continuous components. 

2. Preliminaries. 

2.1. Convex sets and faces. The closure and affine hull of a set B 

are denoted cl(R) and aff(il), and the relative interior [interior in the rela¬ 
tive topology of aff(il)] is denoted ri(R). The linear subspace of M'’* obtained 
by shifting aff(R) to contain the origin is denoted lin(R). Orthogonal pro¬ 
jections to subspaces of the form lin(C'), where (7 C is a convex set, are 
often needed in the sequel; they are denoted briefly as ttc rather than 7riin(C'). 

A face of a nonempty convex set C C R'’* is a nonempty convex subset F of 
C such that whenever tx + {l — i)y G F for some x, y in C and 0 < t < 1, then 
X, y are in F. As in [10], but unlike in [12], the empty set is not considered to 
be a face. The proper faces are those different from C and the exposed faces 
are C itself and its intersections with the supporting hyperplanes of C. A 
proper exposed face F of C is thus represented as F = Cn {x: (r, x — a) = 0}, 
where a ^ C and r G R'’* is a unit vector such that {t,x — a) <0 for each 
X G (7. Obviously, there is no loss of generality in assuming r G lin((7). Such 
a vector r exposes F in C. 

2.2. Convex support and core. A measure always means a hnite Borel 

measure on R'’*. The convex support cs(y) and the convex core cc(y) of y are 
defined, respectively, as the intersection of those convex closed and convex 
Borel subsets C of R'’* which have full /r-measure, fJ,{C) = While the 

former is a standard concept, the latter is of recent origin [6]. Let us recall 
from [6] the key facts 

(1) cs(y) = cl(cc(y)) and cc(y'’**'^^) = F for faces F of cc(y). 


4 


1. CSISZAR AND F. MATUS 


where the restriction of a measure ^ to a Borel subset B of is denoted 

Note that the convex closed set cs(^) is of full /i-measure, but the convex 
set though measurable by [6], Theorem 1, need not be. For brevity, 

lin(^) is written instead of lin(cs(/.i)) and similarly with ri(^) and aff(^). 

Lemma 1. A supporting hyperplane H of cs{n) is of positive pi-measure 
if and only if F = H Cl cc(//) is nonempty. Moreover, ia{H \ cl(F)) = 0. 

Proof. Using [6], Lemma 2(ii), F = cc(//^). This and (1) give cl(F) = cs{p 
whence both assertions follow. □ 


2.3. Exponential families. The term exponential family without any ad¬ 
jective means below a full exponential family based on a (nonzero) measure 
^ on with a canonical statistic equal to the identity mapping. This family 
£ = Sfj, consists of the p.m.’s with //-densities 


( 2 ) 



A ^(i 9 ,x)-A(i 9 ) ^ 


where 


A{'d)=A^{d) = ln[ e^^'^'>p{dx) 

JM.d. 


and the canonical parameter d belongs to dom(A) = {/?:A('d) < oo}. Note 
that p is not uniquely determined by the family £. In particular, any member 
of £ could play the role of //; in this paper, however, p is regarded as given. 

Clearly, if /? £ dom(A) and 9 — d is orthogonal to lin(/i) for some 0 £ 
then also 6 £ dom(A) and = Qe. A bijective parametrization can be given 
as 


£ — {Qv ■ 'd £ ©}, 

(3) 

where 0 = 0^ = dom(A^) n lin(//) = 7riin(^) (dom(A^)). 

Here, 0 equals dom(A) if and only if lin(/i) = [when (2) is called a minimal 
representation]. For the purposes below, it is essential not to require that 
condition and not to require // to be a p.m., either. 

Of main interest are subfamilies 


= “}, “C0, 

of the full family £, primarily when H is convex. In this case, <5= is called a 
canonically convex exponential family. 

The function A is known to be convex and lower semicontinuous, thus 
continuous on closed segments contained in dom(A). The following lemma 
is a minor improvement of Lemma 23.5 in [3]. 
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Lemma 2. Let "dn he a sequence in dom(A) that converges to some'd G 

(i) If converges weakly, then -& € dom(A) and A(?9„) converges to 

m- 

(ii) If 'll G dom(A) and A{iln) — > A('(9), then converges to in vari¬ 
ation. 

Proof, (i) If converges weakly to some p.m. P, then for each con¬ 
tinuity set B of P, 

P{B)= lim {B)= lim exp(-A(7?„)) [ exp((-i?„,x))/r(dx), 

n—>oo n—>oo J q 

where, if B is compact, 

/ exp{{'dn,x))^{dx) ^ / exp((7?,a;))/i(dx) 

JB JB 

by dominated convergence. If also P{B) > 0, then exp(—A(??„)) converges 
to a positive number c. Hence 

P{B)=c- [ ti{dx) 

JB 

for each compact continuity set H of P and, consequently, for all Borel 
sets B. When B = M'^, it follows that c = Hence, 'd G dom(A) and 

A('(9„) converges to A{il). 

(ii) Under the assumptions, the //-densities exp((/?„,x) — A(■!?„)) of 
converge to the // density exp{{il,x) — A(/?)) of Q§ pointwise, which is known 
to imply in variation. □ 

2.4. Extensions of exponential families. The restriction of // to the clo¬ 
sure of a face F of cc(/i) is a nonzero measure by (1). The exponential family 
based on this restriction is denoted . It consists of the p.m.’s Qf,’& 

defined as in (2) with // and A replaced with and 

AF{'&)^lnf e^^’^'^pidx). 

Jc\(F) 

Obviously, dom(A) C dom(Aj?). The family is bijectively parametrized 
as 

£^ = {Qf, '&'■'& G Of}) where Of — dom(AF) n lin(P) = FF(dom(AF)), 

similarly to (3), since = lin(P) by (1). For each Qi? ^ ^ with 

G dom(A), its conditioning Q^('| cl(P)), equal to the restriction di¬ 

vided by Q^(cl(P)), belongs to . The simple fact that (5,^(-| cl(P)) coin¬ 
cides with the p.m. Qfp, where 9 = ff(/ 9) is in Of, is repeatedly used in 
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the sequel. These conditionings of the p.m.’s in £ exhaust £^ if and only if 
the inclusion 7ri7’(0) C 0^ holds with equality. 

The extension ext{£) of an exponential family £ = £^ is the union of 
the families £^ over all faces F of cc(/x). Each £^ is called a component 
of ext(T). A similar construction of a “boundary at infinity” of £ which uses 
faces of cs(/r) rather than of cc(/i) was proposed earlier [3]. Some crucial 
assertions on an exponential family completed by its “boundary” were found 
to be erroneous, but their analogues for ext(T) were found valid in [5, 7]. 
The reason is that ext(£’) may be strictly larger than £ completed by its 
“boundary”: By [6], Lemma 11, the latter consists of those components £^ 
that correspond to the proper exposed faces F of cc(/r). 

2.5. Accessible faces. For any face T of a convex set C C there exists 
a chain 

C = FoDFiD---DFm = F, 

not necessarily unique, such that Fi is an exposed face of Tj_i, 1 < z < m. 
If for every 1 < i < m a unit vector G lin(Tj_i) exposes Fi in Fi-i, then 
Ti,..., Tm is called an access sequence to the face F of C; the access sequence 
to F = C is empty. Since Tj G lin(Fj_i) is orthogonal to lin(Fi), the vectors 
of any nonempty access sequence are orthonormal. 

Let C and H be two nonempty convex subsets of K'^. For our main result, 
where the role of C is played by cc(/i) and the role of S is played by a 
convex subset of 0^, a new concept of S-accessible faces of C is suitable. 
This concept involves a constraint on access sequences in terms of recession 
cones of projections of ri(S). Recall that the recession cone of a convex set 
S C is 

rec(H) = {r: z? + fr G H for all i? G H, f > 0}. 

By [12], Theorem 8.2 and Corollary 8.3.1, rec(ri(H)) =rec(cl(S)), and this 
is a closed cone that contains rec(S). Now, a face F of the convex set C is 
E-accessible if an access sequence ti, ... ,rm to F exists such that 

(4) Tj G rec(7ri7._,^(ri(“:))), 1 < z < m. 

An access sequence to F that satisfies (4) is called adapted to S. It may seem 
artificial that these notions depend on F only through its relative interior, 
but if ri were omitted in (4), some later assertions would not hold; see 
Example 3 in Section 3. Note that the empty sequence is trivially adapted; 
thus C itself is always a F-accessible face of C. 

Lemma 3. IfF.Q lin(C'), an access sequence ri,..., to a proper face 
F of C is adapted to F if and only if ti G rec(ri(F)) and for the face Fi of 
C exposed by ti the access sequence T 2 ,. ■ ■ ,Tm to the face F of Fi is adapted 
to -kfi (S) . 
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Proof. By the hypotheses H C lin(C'), the set 7ri;’p(ri(H)) in the con¬ 
dition (4) for i = 1 is equal to ri(H). In the conditions for 2 < i < m, the 
sets (ri(H)) are equal to the sets 7ri?._j(ri(7ri?^(H))) that appear in the 
analogue of (4) for the adaptedness of r 2 ,...,r^, to 7rp-^{E), since the oper¬ 
ation ri interchanges with orthogonal projections ([12], Theorem 6.6), and 
TTF,.! T^Fi = TTFi.i if i > 2. □ 

Example 1 (Figure 1). Let C C be the convex hull of the union of the 
plane H = M? x {—1} and the triangle T with vertices (1,0,0) and (0, ±1,0), 
and let S consist of those i? G whose second coordinate is strictly between 
— 1 and 1. In this example, H-accessibility of the nine faces of C is discussed. 
The proper exposed faces of C are H and T. Of the six nonexposed faces of 
C, equal to faces of T, consider 


F = {(1,0,0)}, G = {(t,l-t,0):0<t<l}, 5={(0,t,0):|t| <!}. 

Since H is open, the relative interiors in (4) can be ignored. Note that the 
recession cone of 7rc(S) = H is M x {0} x M and the recession cone of vr'r(S) 
is M X {(0,0)}. Let 



and 


^2 = ( 1 , 0 , 0 ). 


Since rec(H) contains both —n and n, the faces H and T are H-accessible. 
Both Ti, r 2 , Ts and ri, r 2 are access sequences to F, with corresponding chains 
C D T D G D F and C D T D F, respectively. Since rec(7rT’(H)) contains 
but not T 2 , the access sequence ri,r 2 is adapted to H, whereas ri,r 2 ,T 3 is 
not. Due to the former, E is a H-accessible face of C. On the other hand, G 
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is not H-accessible, because the only access sequence ri,r 2 to G, with chain 
C dT D G has T 2 ^ rec(7rT(S)). Similarly, the segment S is H-accessible, but 
its endpoints are not. 

2.6. Partial means. When studying r/-closures of exponential families, 
p.m.’s that do not have a mean require special attention. The following 
simple concept is useful: for any p.m. P on M'^, write 

M{P) = {t G : (r, •) is P-integrable} 

and define the partial mean m(P) as the unique element of the linear space 
M{P) with 

[ {T,x)P{dx) = {T,m{P)), rGM(T’). 

jRd 

Note that M(P) = if and only if P has a mean, in which case m(P) equals 
the mean. 

The following lemma is well known, but usually stated in less generality. 

Lemma 4. For d G dom(A) and a unit vector r such that t? + tr G 
dom(A) for some t>0, the integral f {t,x) Q^{dx) exists, either finite or 
—oo. This integral equals the directional derivative of A at d in the direction 

T. 


Proof. The directional derivative, that is, the right derivative of the 
function 1 1 —> A{d + tr) at t = 0, equals 

1 r et{r,x) _ I 

:—-lim / -( 

/ p,{dx) do J t 




^ fi{dx). 


The ratio (e^G^x:) _ decreases to (r, x) when t decreases to zero, and the 
assertion follows by the monotone convergence. □ 


The following lemma shows the relevance of partial means for exponential 
families. 


Lemma 5. The I-divergence D{Qg\\Q^) of p.m.'s in 8 is finite if and 
only if 6 — d £ M{Qe), in which case 

DiQeWQr^) = {0-d, m(Q,)) - A(0) + A{d). 

Proof. By definition, 

D{Qe\\Qg)= j^\nyy^^y^Qe{dx) 

= [ {9-d,x)Qeidx) -A{e) + A{d). 

jRd. 
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The last integral is finite if and only if 0 — i? belongs to M(Qe)) in which 
case it equals {6 — ??,m((5e))- D 

This lemma also is used with £ replaced by a component £^ of ext(T), 
combined with the obvious identity 

(5) Z)(P|IQ)=Z)(P||Q(-|cl(F)))-lnQ(cl(F)), Pe£^,Qe£. 

3. Main results. Below, £ = £^ always denotes an exponential family in 
the sense of Section 2.3, Q§ with -& = dom(A) nlin(^) denotes a p.m. in 

£ and £e with H C 0 denotes a subfamily of £. 

Theorem 1. If a sequence ofp.mis with -dn in © converges weakly 
to a p.m. P, then one of the following two alternatives takes place: 

(i) The sequenee Dn converges to an element D of Q, P = and —> 

P even in variation distanee. 

(ii) The norm of dn goes to oo, a proper exposed face G of cs{fj,) exists 
sueh that P{G) = 1, and the limit of any convergent subsequenee of dn/\\dn\\ 
exposes such a face o/cs(/i). 

The proof is given in Section 4. It follows from Theorem 1 that the weak 
convergence of a sequence to some in £ implies dn d; see [1], 
Theorem 8.3, for a direct proof. A consequence of this and Lemma 2 is 
stated for reference purposes. 

Corollary 1. For a sequence dn in 0 and d nQ the following asser¬ 
tions are equivalent. 

(i) Convergenee —>■ weakly. 

(ii) Convergenee in variation. 

(iii) Convergenees dn^d and A{dn) ^( 1 ?). 

Corollary 2. For any subset H of Q, 

‘^ci(E:)ne 2 cI^('S’h) n£P cl/(£’=). 

When E is convex, the equalities take plaee. 

Proof. The first inclusion follows from Corollary 1. For the second one, 
if D{Q.q^ ||P) —> 0, the sequence converges in variation to P; thus c\v{£'e) 
contains cl/(f’s). Moreover, the alternative (i) holds in Theorem 1, since 
otherwise D{Q.q^\\P) = +00 for all n. This proves that £ contains cl/(f’s). 

Supposing H is convex, let d € cl(H) n 0 and let r be a unit vector such that 
d + tr belongs to ri(H) for some t > 0. It suffices to show that D{Q^^ ||Q» 9 ) —> 
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0 for = i9-{-tnT with tn decreasing to zero. By Lemma 4, r G M(<5 i?„) and 
then Lemma 5 implies 

D{Q-dJ\Q§) =tn{T,m{Qi)J) - A(t 9„) +A(i9). 

Since A('dn) —> A(??) and the sequence (r, m(Q^^)) is decreasing by Lemma 4, 
the claim follows. □ 

The following technical assertion, which is crucial for Theorem 2, is proved 
in Section 5. The notion of H-accessibility enters the scene. 

Lemma 6. If a sequence of p.mis in a canonically convex exponential 
family S-, converges in variation distanee to a p.m. P, then there exists a 
r.-aceessible face F o/cc(/i) such that P belongs to 8^. 

Lemma 6 enables us to conclude that the variation closure of a canon¬ 
ically convex family 8^, is contained in the union of those components 8^ 
of ext(T) which correspond to the H-accessible faces F of cc(^). For our 
main result, still another tool is needed, namely a special convergence con¬ 
cept: A sequence of p.m.’s Qn is said to converge neatly to a p.m. P if 
the P-dominated component of Qn has constant P-density Cn and —> 1 
or, equivalently, if (5„('|cs(P)) equals P when defined and Q„(cs(P)) ^ 1. 
Obviously, the neat convergence implies variation convergence, and even rl 
convergence, due to D{P\\Qn) = — lnc„. 

Lemma 7. For a eonvex subset E of Q and E-aceessible faee F o/cc(/r), 
each p.m. Qpfi with 9 G 7ri7’(ri(S)) is the neat limit of a sequence from £’ri(H). 

This lemma is proved in Section 5. 

Theorem 2. The variation elosure of a canonieally eonvex exponential 
family 8^ is 

cb(£’H) =[J{QF,6i:^Gcl(7ri7’(“))n0F}, 
where the union runs over all E-aceessible faees F of cc{pL). 

Proof. Suppose P is the limit in variation distance of a sequence Qn 
in 8'e.- By Lemma 6, P G 8^ for a H-accessible face P of cc(/i). Then 
P(cl(P)) = 1 implies Qn{c\{F)) —> 1, and thus the sequence Qn('|cl(P)) in 
8^ also converges to P in variation distance. These conditioned p.m.’s are 
of form Qpfin with On G TTp{E)] thus Corollary 2 implies P = Qf, 9 for some 
9 in cl(7ri7’(H)) n 0^?. This proves the inclusion C. 
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Conversely, if F is a H-accessible face of cc(/i), Lemma 7 implies that 
cl,;(£’s) contains the p.m.’s Qpfi with 6 £ 7rj7’(ri(H)). Thus, by Corollary 2 
and the convexity of 7ri7’(ri(H)), 

cl^(Ts) T {Qf ^0 :6 G cl(7rF(ri(H))) n 0 f}. 

Since ri interchanges with projections, the inclusion T follows. □ 

Example 2. Using the notation of Example 1, let // be the sum of the 
measure that gives mass 1 to each vertex of T and of the p.m. on the plane H 
with density exp(—xf)exp(—|a: 2 |) w.r.t. the Lebesgue measure on H. Then 
cc(^) = C and 0^ = dom(A^) coincides with S of Example 1, defined by 
restricting the second coordinate to be between —1 and 1. By Theorem 2, 
the variation closure of the full family £ = 8^ intersects hve out of the nine 
components of ext(£’). In addition to the full families 8, 8^ and 8^, the 
latter consisting of the point mass at (1,0,0), cli,(£’) contains also some 
p.m.’s from and 8^. Note that cl„(T) intersects 8^ but not although 
FcG. 

Part (ii) of the following Theorem 3 gives a necessary and sufficient con¬ 
dition for a p.m. P in c1„(Th) to belong also to clr/(£’H)j even when H C 0 
is not convex. When H is convex, this condition can be effectively verified 
by Theorem 2. Part (iii) provides a simple sufficient condition which has a 
direct proof. A trivial consequence of Theorem 3 is that clr/(f’s) contains 
all p.m.’s in cl^(£’s) that have a mean. 

Theorem 3. (i) Suppose a sequence Q-d^, '&n £ &, of p.m.'s in 8 con¬ 

verges in variation to a p.m. P = QF,e, 0 £Qf, in a component 8^ ofey±{8). 
Then D(P\\Q^^) goes to zero if it is eventually finite, which is equivalent to 
'dn£0 + M{P) eventually. 

(ii) A p.m. P = Qf.b belongs to the rl-closure of a subfamily 8^, of 8 if 
and only if it belongs to the variation closure of i£’Hn(0-i-M(P)) • 

(iii) The rl-closure of a canonically convex exponential family 8-, contains 
all p.m.'s P in the variation closure of 8 e that satisfy D{P\\Q) < oo for some 

Q £ • 


Proof, (i) Consider first the case when 8^ = 8; that is, P = for 
some id £ Q. Then and A(??„) —> A(??) by Corollary 1. Thus, the 

assertion for this case follows by Lemma 5. 

When E is a proper face of cc(^), note that the variation convergence as¬ 
sumption implies Q^^(cl(E)) —> E(cl(E)) = 1. Hence, (5) shows that D{P\\Q^^) 
goes to zero if and only if L)(E||Q,j^(-| cl(E))) does. Moreover Qi 5 ^(-| cl(E)), 
equal to with = FF{'&n), also converges in variation to P. 
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It follows, applying the result in the first case to in the role of £, 
that D{P\\Q^^) —> 0 if it is eventually finite. Also by (5), the finiteness 
of D{P\\Q^^) is equivalent to that of D{P\\Q^^{-\cl{F))) and, therefore, by 
Lemma 5, to 6 — On & M{P)- Since lOn — On is orthogonal to lin(F), it belongs 
to M{P)', thus the last condition is equivalent to lOn & 0 + M{P)- 
(ii) This follows directly from (i). 

(hi) Suppose P = Qpfi ^ cl^(£^s), where F is a H-accessible face of cc(/z) 
and 0 belongs to cl(7ri?(H)) n Qp] see Theorem 2. If D{P\\Q^^) is hnite for 
some 'do € then as in the proof of part (i), D{P\\Qp^ 0 ^) is also hnite, 

where Oq = 't^p{'&o) and Qpfi^ = Q^o(-| cl(F)). Then for On = tn^o + (1 - tn)0 
with tn i 0, D{P\\Qp^g^) is also hnite by Lemma 5 and Qp,e in 

variation by Corollary 1. It follows by part (i) that D{P\\Qp^g^) —> 0. 

Since Oq € 7rF(ri(H)) = ri(7rF(H)) and 0 € cl(7rF(H)) imply On G ri(7rF(H)), 
Lemma 7 gives that each Qpfi^ is the neat limit of a sequence in 
Thus to each On there exists "On G ri(H) such that Qpfi„ = cl(F)) and 

(cl(F)) is arbitrarily close to 1. From this and D{P\\Qp^g^) —> 0 the claim 
D{P\\Qg^) —> 0 follows by (5). □ 


The following example illustrates a use of Theorem 3(ii) and Theorem 2 
when deciding whether a p.m. belongs to the r7-closure of a canonically 
convex exponential family. It also illustrates why ri(H) rather than H appears 
in the dehnition (4) of H-accessibility and in Lemma 7. 


Example 3 (Figure 2). Let /r be the measure on equal to the sum 
of the point mass at (—1,0,0), the image P under 1 1 —(0,^,0) of the p.m. 
with density ^ on the half-line t > 1 and the image under 1 1 —> (t,t^, —1) of 
the p.m. with density ^ on the same half-line. Then 


A(i?) = In 


( 6 ) 


/■oo dl 

exp{-di) + J^ exp(i?2i)^ 

/ oo 2 dt 

expi'&it + 


d= {'di,d2,dz), 


dom(A) is given by -(92 < 0 or -02 = 0, < 0, and 0 = dom(A). Consider 

H = 0, thus £e = £, and the face F = {(0,t,0) > 1} of cc(/r). This F is not 

exposed and the unique access sequence to it is n = (0,0,1) and T 2 = (1,0,0). 
This access sequence is adapted to H but it would not be if H rather than 
ri(H) had been used in the definition (4). Since Pp{r) = {(0,^,0) < 0} = 

0F and F is H-accessible, Theorem 2 gives that £^ C cl^(f’). In particular, 
cl„(£’) contains P that equals Qp^ with 0 = (0,0,0) G Pp{^). On the other 
hand, as 


M(F) = M X {0} X M, 


H n (0 + M(P)) = {(7?!, 0, do ): r9i < 0} 
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and the unique access sequence to F is not adapted to the latter set, P 
is not in the variation closure of £^sn(6»+M(P)) Theorem 2. Consequently 
P ^ c\ri{£) by Theorem 3(ii). Thus, P cannot be the neat limit of any 
sequence in showing that Lemma 7 is not valid when ri(H) is replaced by 


Our hnal results address variation convergence of arbitrary sequences in 
ext(f’). 

Theorem 4. If a sequence Qn in ext(£’) converges in variation distance 
to a p.m. P, then P belongs to a component o/ext(T), for sufficiently 
large n the face Fn of cc(/i) with Qn G £^'^ contains F, and the conditioned 
p.mds Qn{' \ cl(T)) belong to £^ and also converge to P in variation distance. 

Theorem 4 is proved in Section 6. 

Corollary 3. The extension ext{£) of an exponential family £ is variation- 
closed. 

Corollary 3 strengthens [7], Theorem 2, on the r/-closedness of ext(£’). 
Note that a family £ completed by its “boundary at infinity” in the sense 
of [3] is not necessarily r/-closed, let alone variation-closed, contrary to [3], 
Lemma 23.7; see [6], Example 3. 

Corollary 4. If a sequence Qn in ext(£’) converges in variation dis¬ 
tance to a p.m. P and D{P\\Qn) is eventually finite, then D{P\\Qn) 0. 

The eventual hniteness takes place, in particular, if P has a mean. 

Proof of Corollary 4. By Theorem 4, the variation convergence 
of Qn G to P implies P € £^ , F C Fn eventually and Qn{-\ cl(T)) — > P 
in variation. Hence, the proof can be completed similarly to that of Theo¬ 
rem 3(i) using (5) with £^" playing the role of £. □ 



Fig. 2. 
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4. Weak convergence in exponential families. In this section Theorem 1 
is proved. For its second alternative, a corollary of the following lemma is 
needed. Note that 


p{a) = inf{||y - a||: y G aff(y) \ cs(/r)} 


is obviously positive for any element a of ri(/i). 

Lemma 8. For a G ri(/i) and 0 < s < p{a) there exists a positive constant 
C such that the inequality 


{'&, b-a)> r||T9|| - + 1] 


(7) 


holds for a// 0 < r < s, 1 ? G 0 and b satisfying (?9, b) = f^d (t?, x)Q^{dx). 

In particular, (7) holds when b is the mean of Q^. What actually is used 
is the following consequence of Lemma 8. 

Corollary 5. Let idn be a sequence in 0 such that each p.m. has 
a mean bn- If ||i?n|| —>■ oo, i?n/||'dn|| —> t and bn b, then {T,b — a) > p{a) 
for each a G ri(/i). 

Proof. For a G ri(y) and 0 < r < s < p{a), (7) implies 


fdn-.bn-a) >r||'i9n|| -C[r||??n||exp(-(s-r)||'d„||) + exp(-s||??„||)]. 


Dividing both sides by ||i9n|j and letting n —> oo, it follows that {T,b — a) >r 
for all 0 < r < s < p{a). □ 

For the proof of Lemma 8 the following auxiliary lemma, a simple refine¬ 
ment of known assertions (see [3], Lemma 21.8, and [11], proof of Theorem 
3.1), is needed. Denote 


^a,s — {a; G — a) > s||'d||}, 'i9,a G M'^, s > 0, 


which is a closed half-space of distance s from a when D ^0. 


Lemma 9. If a £ ri(//) and 0 < s < p{a), then 



is positive and 


( 8 ) 


A{'d) — {il,a) — s||7?|| > Inc, 


id G lin(y). 
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Proof. Let | Ca^s for a sequence in lin(;u) that can be sup¬ 

posed to consist of unit vectors converging to some r G lin(/i). Each x G 
with {t,x — a) > s belongs to eventually; thus the open half-space given 
by the last inequality is covered by the union over all m of the intersections 
r\n>m S' Since the /r- measure of the half-space is positive due to s < p{a), 
one of these intersections has positive /i-measure. Thus, Ca,s > 0. Finally, the 
inequalities 

gAW-{^,a> ^ r > f ' K^as) 

jRd Jai, 

are valid for any -d G lin(/i) and imply (8). □ 

Proof of Lemma 8. Abbreviate Af ^ to A and abbreviate its comple¬ 
ment to B. Then 

{'d,b-a)= [ {'d,x - a)Q^{dx) > [ {'d,x-a)Q^{dx)+r\\'d\\Q^{A) 

jRd Jb 

and, in turn, 

{d,b-a) -r||r?|| > [ [{d,x-a) - r\\'d\\]Q^{dx) 

Jb 

= e<’^’“)-A(fo / [(r9,x-a)-r||7?||]e<’^’^-“V(dx). 

Jb 

Since te^ > —1 for each t G M and for x € B, the last integral 

is bonnded below by 

It follows that 

{iJ,b-a)-r\\iJ\\ > +1], 

This completes the proof on account of (8). [The constant C in (7) can be 
chosen as /i(M'^)/ca^s.] □ 

Proof of Theorem 1. Let a sequence of p.m.’s with in 0 = 
dom(A) nlin(//) converge weakly to a p.m. P. Note that P(cs(/r)) = 1 because 
Q{cs{fi)) = 1 for every Q G T. 

In the case when the sequence 'dn is bounded, let 'd be the limit of an arbi¬ 
trary convergent subsequence. By Lemma 2(i), the weak convergence of 
along this subsequence implies id G dom(A), and since clearly d G lin(/z), also 
G 0. Moreover, P = by Lemma 2, and since the parametrization (3) of 
£ is bijective, it follows that each convergent subsequence of dn has the same 
limit. Thus dn converges to d. Applying Lemma 2, A{dn) —> A('i9) follows 
from the weak convergence of by (i), and P in variation by (ii). 
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In the case when ||'i?n|| oo, assume with no loss of generality that 
^9n/||i?n|| converges to some r, clearly in lin(/i). Suppose first that cs(//) is 
compact. Then the mean of exists and converges to the mean b of P. 
By Corollary 5, (r, b — a) > p{a) for each a G ri(//) and thus (r, x — 6) < 0 for 
each X € cs(/i). Since obviously b € cs{p), it follows that Hi, = {x: (r, x — b) = 
0} is a supporting hyperplane of cs(/r), and G = HbCi cs(/r) is a proper face 
of cs(/i), exposed by r. Then P(cs(/r)) = 1 implies P{G) =P{Hb) and this 
equals 1 because the mean 6 of P belongs to Hb- 

Turning to the situation when cs(/r) is not compact, there exists a conti¬ 
nuity set P of P such that P(P) > 0 and lin(/i^) = lin(/i). Then the condi¬ 
tioned p.m.’s Qi?„(-|P) belong to the exponential family based on , the 
parameters in 0 = 0^ belong also to 0^3 and converges weakly 

to P('|P). If, in addition, B is compact, then P(-|P) has a mean b and the 
result proved above gives that Hb is a nontrivial supporting hyperplane of 
cs(/i^). Taking another compact continuity set C of P with C P B, let c be 
the mean of P(-|C) and let be the corresponding supporting hyperplane 
of cs{pP). Since P{P[b\B) = 1 and P{Hc\C) = 1 together with C P B imply 
P{p[c\B) = 1, the parallel hyperplanes Hb and coincide. This proves that 
Hb is a nontrivial supporting hyperplane of cs{p^) satisfying P{Hb\C) = 1 
for those compact continuity sets C of P which contain B. Then P{Hb) = 1 
and Hb is a nontrivial supporting hyperplane to cs{p), as well, because each 
X € B{p) belongs to cs{p^) for some C as above. Thus, G = HbH cs(/i) is a 
proper face of cs(/r) exposed by r and P{G) = 1, the same conclusions as 
when cs(/i) was compact. 

Finally, 'dn cannot have both a convergent subsequence and another sub¬ 
sequence with norms tending to infinity. Indeed, by the above arguments, 
the former implies P G T and the latter implies P{G) = 1 for a proper face 
of cs(/i), a contradiction. □ 

5. Proofs of Lemmas 6 and 7 . 

Proof of Lemma 6. By induction on the dimension of cs(/r). The case 
of zero dimension is trivial. The induction hypothesis assumes the assertion 
is true for canonically convex exponential families based on measures whose 
convex support has smaller dimension than that of the given p. 

Given with G H converging in variation distance to P, if the alter¬ 
native (i) in Theorem 1 takes place, then P G 6^ holds with P = cc{p), obvi¬ 
ously a H-accessible face of cc{p). Otherwise, by Theorem l(ii), Hdnll —> -|-oo 
and the limit ri of any convergent subsequence of exposes a proper 

face G of cs{p) with P{G) = 1. Note that G 0 C lin(/i) implies n G lin(/i). 
The supporting hyperplane H = {x : {ti, x — a) = 0} that contains G has pos¬ 
itive /r-measure, since P{H) = P{G) = 1 and p dominates the variation limit 
P of the p.m.’s It follows by Lemma 1 that Fi = H ncc(/i) is a face 
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of cc(/i), clearly exposed by ri, and fj,{H \ cl(Fi)) = 0. Thus P{H) = 1 im¬ 
plies P{cl{Fi)) = 1. By the variation convergence —>■ P, it follows that 
Q^„(cl(Fi)) ^ 1 and the conditioned p.m.’s (-1 cl(Ti)) also converge to 
P in variation distance. These p.m.’s belong to 6^^ and thus can be written 
as Qpifin with 0n = 'Ppii'&n) Hence, by the induction hypothesis 

applied to the canonically convex exponential family 

{QPlfi -.O ^ 7ri;’^(H)}, 

their variation limit P belongs to for a face F of cc(/r'’*^^i^) = Fi, which 
is (H)-accessible, that is, an access sequence T 2 , ■ ■ ■ ,Tm to the face F of 
Fi is adapted to 7ri7’j(H). Since ti is the limit of a convergent subsequence of 
"dn/lli^nll with 'dn C h belongs to rec(ri(H)) by [12], Theorem 8.2. This and 
the adaptedness of r 2 ,... ,Tm to Pp-^^{E) imply by Lemma 3 that the access 
sequence ri,..., to the face F of cc{fi) is adapted to H. □ 

The following simple auxiliary assertion resembles [3], Lemma 21.7, and 
[7], Lemmas 6 and 7(i). 

Lemma 10. Ifr exposes a face G ofcc{fi), then r belongs to the recession 
cone o/dom(A^) and for every d G dom(A^), the sequence Q^d+nr converges 
neatly to Q^{-\ cl(G)) G £^. 


Proof. Since r exposes G, (r, x) < (r, a) for all x G cc(/i) and a G G, 
with equality if and only if x G G. Then, for t > 0 the function 
of X G cs(/r) is bounded above by . This implies that 'd Ptr G 

dom(A^) whenever G dom(A^) and t>0, proving r G rec(dom(A^)). 

Knowing from Lemma 1 that fi{cl{G)) equals the //-measure of the sup¬ 
porting hyperplane H = {x: (r, x — o) =0} with a G G, we have for any 
-!? G dom(A^) 

A^{'& + nr) - {dp nr, a) 


= In 




= In 


/ n{dx) 

lc\{G) Jcs{fi)\H 


When n tends to infinity, A^(??-|-nr) — {dp nr, a) decreases to Acid) — 
{d,a), since the integral over csl/i) \ H decreases to zero by dominated con¬ 
vergence. This fact and for x G cl(G) imply 


g^+„,(cl(G)) 

Jcl{G) 



^(^dx) = 1 
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and Q^+nr('| cl(G)) = cl(G)). Thus the neat convergence follows. □ 

Proof of Lemma 7. By induction. As in the proof of Lemma 6, the 
induction hypothesis assumes the assertion is true for exponential families 
based on measures whose convex support has smaller dimension than that 
of jl. 

The assertion trivially holds if T = cc(/i). Thus suppose F is proper face 
of cc(/i) and let ri,...,rm be an access sequence to F adapted to H. Let 
Fi be the face of cc(/i) exposed by ti. Then, by Lemma 3, ti € rec(ri(H)) 
and the access sequence T 2 , ..., to the face F of Fi is adapted to -nFi (S)- 
Thus F is a (H)-accessible face of Fi. 

To prove that for -d G i’i(S) there exists a sequence of p.m.’s in £n{'E) 
that converges neatly to cl(F)), apply the induction hypothesis to 
the exponential family based on with convex core Fi, to the 

7rj7’^(H)-accessible face F of Fi and to 0 = in '^Fi(i'i(^)) = i'i(7rj7’^(H)). 

It follows that some sequence in with G vriT’^ (ri(H)) converges 

neatly to the conditioning on cl(F) of Qfi^-, which equals Q,?('| cl(F)) since 
9 = ff^{'&)- Here, on account of 0^ G (ri(H)), the p.m. Qfi^^ equals 
Q^„(-|cl(Fi)) for some 'dn Gri(H). 

Since ri G rec(ri(H)), Lemma 10 gives that the sequence Q^„+fcri converges 
neatly to (5i?„('| cl(Fi)) = Qp^ Q^^ where each is in £’ri(E:) due to 

Ti G rec(ri(H)). The last assertion and the neat convergence of QFifin to 
Q^(-| cl(F)) imply that for a suitable sequence kn —> oo, the p.m.’s 
in £’ri(s) converge neatly to Q^(-| cl(F)). □ 

6. Variation convergence in ext(F). In this section Theorem 4 is proved. 
An auxiliary lemma is sent forward. 

Lemma 11. If dominates a p.m. P, then: 

(i) There exists a face F of cc(/i) with F(cl(F)) = 1 such that all faces 
G of cc{p) with F(cl(G)) = 1 contain F. 

(ii) If P{c\{Fn)) —> 1 for a sequence Fn of proper faces of cc{pL), then the 
face F of (i) is proper. 

Proof, (i) The closure cs{p) of cc(/i) has full /r-measure, hence also full 
F-measure due to domination. Thus, the face G = cc(/r) of cc(/x) satishes 
F(cl(G)) = 1. Consider any face G with the last property and let F be a face 
with that property whose dimension is smallest. Then both and 

dominate F, hence so does also the restriction of pL to cl(F) ncl(G). By [6], 
Corollary 4, this intersection has the same /x-measure as cl(FnG). Therefore, 
the restriction of pL to cl(F n G) dominates F and thus F(cl(F n G)) = 1. 
The minimality of the dimension of F implies F FG. 
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(ii) The proper faces in the hypotheses can be supposed to be exposed. 
Thus let a unit vector from lin(/r) expose Fn of cc{fi). Then for a G ri(/r) 
the closed half-space {x: {Tn,x — o) < 0} is disjoint with cl(F„); thus its P- 
measure is at most 1 — P(cl(F„)). It can be assumed that Tn^r and then, 
as in the proof of Lemma 9, 

{x: (r, X - a) < 0} C (J Q {x : (r„, x - a) < 0}. 

m>l n>m 

Since P{{x: {t^x — a) < 0}) < 1 — P(cl(F„)) and P(cl(F„)) —> 1, the open 
half-space on the left-hand side has P-measure zero whenever a G ri(/i). 
Hence, on account of P{cs{^)) = 1, r exposes a proper face of cs(/r) that 
has full P-measure. Thus, there exists a nontrivial supporting hyperplane 
H of cs(/r) with > 0. By Lemma 1, G = H (1 cc(^) is a proper face of 

cc(^) and fi{F[ \ cl(G)) = 0. It follows that P(cl(G)) = P{H) = 1. Hence, G 
contains the face P of (i) which implies that P is proper. □ 

Proof of Theorem 4. The variation limit P of p.m.’s Qn in ext(£’) 
is obviously dominated by /r; thus Lemma ll(i) applies to this P. Let P be 
the smallest face of cc(/r) with closure of full P-measure. The variation 
convergence Qn —> P implies Qn{cl{F)) —> P(cl(P)) = 1, and Qn G im¬ 
plies Qn{c\{Fn)) 1. Since (5n(cl(Pn) H cl(P)) equals (5n(cl(Pn n P)) by 
[6], Corollary 4, it follows that Qn (cl(Pn fl P)) —> 1. Thus, again by the 
variation convergence, also P(cl(Pn n P)) ^ 1. If a subsequence of P n P^ 
consisted of proper faces of P, the last limit relationship would imply, by 
Lemma ll(ii), applied to in the role of /x, the existence of a proper face 

of P = cc(^'^^^^^) with closure of full P-measure, a contradiction to the choice 
of P. This proves that Fn eventually contains P. The last inclusion implies 
that the conditioning Qn('|cl(P)) of Qn G belongs to , and since 
Qn(cl(P)) ^ 1, these conditionings also converge to P in variation distance. 
Finally, applying Theorem 1 to the p.m.’s Qn{'\ cl(P)) in , the alternative 
(ii) is ruled out by P not having proper faces with closure of full P-measure, 
and it follows that P G . 

□ 
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